Vocal communication is costly. Individuals learn and produce signals in noisy environments, all while avoiding predation from species adapted to detect them. As a result, communication systems tend to be optimized for efficiency, or the benefit that they bestow relative to the costs of learning and producing them:
\[\begin{equation} \textrm{efficiency} \propto \frac{\sum{\pi}}{C_{L} + \sum{C_{P}}} \tag{1.1} \end{equation}\]
where \(\sum{\pi}\) is the lifetime benefit of the communication system, \(C_{L}\) is the cost of learning it, and \(\sum{C_{P}}\) is the lifetime cost of producing it (Gruber et al., 2022). Note that efficiency here is optimized over the course of cultural and biological evolution, as \(\sum{\pi}\), \(C_{L}\), and \(\sum{C_{P}}\) are incomputable by the individual. This framing is consistent with understandings of efficiency in linguistics (Gibson et al., 2019): the famous “principle of least effort” can be thought of as a minimization of these costs (Zipf, 1949).
On the other hand, simpler sounds that are easier to learn and produce vary across fewer dimensions and are less distinctive from one another (Miton & Morin, 2021). Signals need to be distinguishable to be functional (De Boer, 2000), and complexity expands the possibilities within a signal space in a way that enhances functionality (e.g. to communicate concepts in language, or attract mates in birdsong) (Fitch, 2000). But there is, of course, a limit. Simulations show that cultural evolution plateaus when complex behaviors are too costly (Mesoudi, 2011). Evidence from linguistics, animal behavior, and cultural evolution suggest that communication systems generally evolve to balance this complexity-efficiency trade-off (Gibson et al., 2019; Gruber et al., 2022; Hahn et al., 2020; Jaeger & Tily, 2011; Youngblood et al., 2023).
Notions of complexity vary widely and are sometimes uncorrelated across levels of analysis (Benedict & Najar, 2019; Mikula et al., 2018). In the birdsong literature, for example, complexity is usually described at one of three levels: syllables (individual sounds within songs), songs (sequences of syllables), or repertoires (full set of syllables or songs that a bird produces). Syllable and repertoire complexity are approximated with measures of production cost (e.g. frequency bandwidth, number of transitions in pitch (Benedict & Najar, 2019; Youngblood & Lahti, 2022)) and learning cost (e.g. diversity of unique syllables or songs (Garamszegi et al., 2005; Soma & Garamszegi, 2011)), respectively. Song complexity, on the other hand, is often characterized by measures of hierarchical or combinatorial structure (Kershenbaum et al., 2014; Sainburg et al., 2019). Structured signals are more compressible and learnable (Gibson et al., 2019; Raviv et al., 2021; Verhoef, 2012), which allows more information to “pass through the bottleneck” of cultural transmission (Gruber et al., 2022; Kirby, 2017). The former increases both production and learning costs, whereas the latter decreases learning costs. Both notions of complexity, however, should boost the benefits of communication by expanding the signal space. Syllable complexity allows for greater diversity in syllable types, and song complexity allows those syllable types to be combined into a wider array of song types.
The effect of this complexity-efficiency trade-off has been confirmed by experiments showing that artificial languages gain efficiency (Fay & Ellison, 2013; Fedzechkina et al., 2012; Roberts et al., 2015) and structure (Kirby et al., 2008; Kirby et al., 2015; Nölle et al., 2018; Raviv et al., 2019) as they are culturally transmitted and used by participants. Efficiency gains can occur from language use alone (Motamedi et al., 2019), but the emergence of structure seems to require both use and cultural transmission (Carr et al., 2017; Kirby et al., 2015). Most experimental work has focused on compositional structure, or the lexical relationships that determine meaning. Birdsong has no mapping between signals and meanings and thus no compositional structure (Miyagawa et al., 2013), but iterated learning may still lead to combinatorial and hierarchical structure (Kirby et al., 2014). Studies of continuous whistled communication systems that resemble birdsong have also detected increases in hierarchical and combinatorial structure (Tamariz & Kirby, 2016; Verhoef, 2012; Verhoef et al., 2014, 2016).
Birdsong experiments yield similar results. Zebra finches raised in isolation sing atypical songs, but over several generations they converge towards wild-type song by shortening the longest syllables and gaining spectral structure (Fehér et al., 2009). In a replication of Fehér et al. (2009), isolate song also increased in efficiency: song bouts and motifs became shorter and denser, and syllables became shorter and lower in amplitude (Diez & MacDougall-Shackleton, 2020). Zebra finches also appear to learn songs more accurately when they have more acoustic variation and less extreme features, possibly as a way to balance the diversity of sequences and efficiency of syllables (Tchernichovski et al., 2021). Even foraging behaviors that are culturally transmitted become more efficient over time, a tendency that is accelerated by population turnover because new group members are more likely to adopt the more efficient behavior (Chimento et al., 2021).
Pressure for compression and efficiency lead to regularities in organization that are so universal in human language that they are referred to as linguistic laws. The three most commonly-studied of these are Zipf’s rank-frequency law, Zipf’s law of abbreviation, and Menzerath’s law (Semple et al., 2022).
Zipf’s rank-frequency law predicts that the frequency of an item will be proportional to the inverse of its rank, a relationship that holds for most, if not all, of the world’s languages (Piantadosi, 2014). In this study, I will focus on Mandelbrot’s more flexible parameterization of Zipf’s rank-frequency law (see Analysis) (Mandelbrot, 1953, 1962), which is its most common form in contemporary linguistics (Piantadosi, 2014). Non-human animal communication systems, including bird and cetacean vocalizations (Allen et al., 2019; Briefer et al., 2010; Cody et al., 2016; Hailman et al., 1985; McCowan et al., 1999) and lizard courtship displays (Martins, 1994), exhibit more redundancy than languages, leading to a convex rank-frequency relationship that is better captured by the Zipf-Mandelbrot distribution. Zipf and Mandelbrot both interpreted this rank-frequency law as resulting from a minimization of production and perception costs (Mandelbrot, 1953; Zipf, 1949), and there are models showing that it can be derived from communicative efficiency (Ferrer-i-Cancho, 2016; Manin, 2009; Salge et al., 2015). However, some of these models assume that signals map to objects or concepts which is not the case in birdsong, and other causes are still debated (Piantadosi, 2014) (Allegrini et al., 2004; Corral & Serra, 2020; Simon, 1955). Even though there is still uncertainty about its causes, the presence of Zipf’s rank-frequency law in non-human communication systems has been interpreted as evidence for both communicative efficiency and information content (Genty & Byrne, 2010; Kershenbaum et al., 2021; McCowan et al., 1999; Stepanov et al., 2023).
Zipf’s law of abbreviation predicts that common items will tend to be shorter than rare items because their production cost is lower (Zipf, 1949). This negative correlation between frequency and duration is widespread in both written (Bentz & Ferrer-i-Cancho, 2016) and spoken language (Petrini et al., 2023), and has also been observed in writing systems (Koshevoy et al., 2023) and non-human communication like chimpanzee gestures (Heesen et al., 2019) and bird and primate vocalizations (Favaro et al., 2020; Huang et al., 2020; Semple et al., 2010). In some cases, like hyrax vocalizations, Zipf’s law of abbreviation is absent for duration but is present for other measures of production cost such as amplitude (Demartsev et al., 2019). The explanation for Zipf’s law of abbreviation is simple: when common items have a lower production cost than rare items then the overall production cost of signals goes down (Ferrer-i-Cancho et al., 2013). Shorter signals carry other benefits as well, such as reduced predation risk and reverberation in the environment (Ferrer-i-Cancho et al., 2013).
Menzerath’s law predicts that longer sequences will be comprised of shorter items to balance production costs (Menzerath, 1954). This negative correlation between sequence length and item length is found at various levels of analysis in language (e.g. clauses in sentences, morphemes in words) (Cramer, 2005; Eroglu, 2013; Stave et al., 2021) as well as in music (Boroda & Altmann, 1991). In non-human communication, Menzerath’s law appears to be present in chimpanzee gesture (Heesen et al., 2019) and in some primate and bird vocalizations (Favaro et al., 2020; Fedurek et al., 2017; Gustison et al., 2016; Huang et al., 2020; James et al., 2021), although its detection may depend on the type of vocalization (Clink & Lau, 2020) and level of analysis (Clink et al., 2020). The explanation for Menzerath’s law is a simple extension of Zipf’s law of abbreviation: when production costs are increased in one domain (e.g. song sequence length) they should be decreased in another (e.g. syllable duration).
Beyond linguistic laws, there are two proxy measure of linguistic structure that have recently been investigated in non-human communication: small-world structure (Watts & Strogatz, 1998) and mutual information decay (Sainburg et al., 2019).
Small-world networks are highly clustered and have short average path lengths, so that it only takes a few steps to jump between any pair of nodes (think “six degrees of separation”) (Humphries & Gurney, 2008; Watts & Strogatz, 1998). These sorts of networks are quite common in biological and social systems (Humphries & Gurney, 2008; Watts & Strogatz, 1998), including language. For example, networks of neighboring words in sentences and co-occurring words in thesauruses exhibit small-world structure (Cancho & Solé, 2001; Motter et al., 2002; Steyvers & Tenenbaum, 2005). One explanation for this is that short distances between concepts make language easier to process (Beckage et al., 2011). More broadly, though, small-worldness is thought to reflect general systematic structure and recurrence, which in turn improve the compressibility and learnability of information (Allen et al., 2019; Kirby et al., 2015; Raviv et al., 2021). It is also hypothesized to reflect the emergence of syntactic structure over time (Solé et al., 2010), as both nightingales and children in more advanced stages of vocal development have greater small-world structure in their transition networks (Beckage et al., 2011; Weiss et al., 2014). Humpback whales (Allen et al., 2019) and several songbird species (Beckage et al., 2011; Cody et al., 2016; Hedley, 2016; Sasahara et al., 2012; Weiss et al., 2014) all exhibit small-worldness in their song syntax to a similar degree: \(SWI \sim 1.69-4.7\). The small-worldness index (\(SWI\)) is based on the level of clustering and the distances between nodes in the network of signal transitions, where values \(> 1\) are consistent with small-world structure (Humphries & Gurney, 2008) (see Analysis).
Mutual information is a measure of dependency, or the amount of information that the presence of one thing has about another. In the context of language, past words provide information that can help predict future words above chance levels (i.e. “do you need anything from the” \(\to\) “store”). Many large language models, for example, are optimized for precisely this task (Schrimpf et al., 2021; Shlegeris et al., 2022). Intuitively, a word contains more information about the very next word than the one that comes after it (Pothos & Juola, 2007), but correlations can be detected even at very long distances of hundreds of words (Altmann et al., 2012; Alvarez-Lacalle et al., 2006). The rate at which the mutual information of words decreases with increasing distance can provide clues about underlying syntactic structure. Simple models of grammar that assume Markov processes (i.e. next word depends only on current word) lead to exponential decay in mutual information with distance, whereas hierarchical processes (i.e. word order comes from syntactic rules, common view going back to Chomsky (1957)) lead to power-law decay (Alvarez-Lacalle et al., 2006; Li, 1990; Lin & Tegmark, 2017). Some past studies have shown that mutual information decay in real language often follows a power-law (Ebeling & Pöschel, 1994; Li, 1990; Lin & Tegmark, 2017), but other work suggests a more complicated picture somewhere between these accounts (Frank et al., 2012; Melnyk et al., 2005). In German, Italian, Japanese, and English, mutual information decay is exponential at short distances and fits a power-law at long distances, suggesting that sequences are generated by a combination of Markovian and hierarchical processes (Sainburg et al., 2019; Sainburg et al., 2022). This pattern has also been documented in four bird species, suggesting that sequential organization of song is more complex than previously thought (Sainburg et al., 2019).
Law/Property | Level | Description | Explanation |
|---|---|---|---|
Zipf's rank-frequency law | Syllables | Frequency follows a power-law with rank | Unclear |
Zipf's law of abbreviation | Syllables | Common items are shorter | Production cost |
Menzerath's law | Both | Longer sequences are comprised of shorter items | Production cost |
Small-worldness index | Songs | Sequences have general systematic structure | Learning cost |
Mutual information decay | Songs | Sequences have Markovian or hierarchical structure | Learning cost |
In languages, the boundaries between signals at different levels of hierarchy (e.g. phonemes, words) are apparent to the humans who use them. In non-human communication systems, categorizing signals into types is its own challenge. For relatively small datasets it is possible to manually inspect recorded signals and assign types. Increasingly, though, researchers are turning to automated methods such as hierarchical clustering (Ju et al., 2019; Youngblood & Lahti, 2022), dimension reduction (Sainburg et al., 2020), and machine learning (Merino Recalde, 2023; Rivera et al., 2023) that classify signals into types based on their acoustic features. These methods reduce subjectivity in classification and enable people to work with much more data, but they also require tuning. For example, hierarchical clustering, which is my preferred classification method, requires the user to choose an appropriate threshold below which the “branches” of the “tree” are combined into types (see Figure 2.2). This threshold effectively controls the granularity of clustering: higher values over-lump signals into fewer categories, while lower values over-split signals into more categories. The granularity of an analysis may influence the kinds of patterns that can detected. Philosophers of cultural evolution call this the “grain problem”—some statistical patterns may be more apparent at certain levels of analysis (Charbonneau & Bourrat, 2021). In Gibbon vocalizations, for example, evidence for language-like compression is present for notes but not for phrases (Clink et al., 2020). Some have even proposed that adherence to Zipf’s laws could be used a assess whether classification algorithms have identified relevant units in animal communication systems (Kershenbaum et al., 2016).
The aim of this study is to assess the evidence for language-like efficiency and structure in house finch song across three levels of granularity in syllable clustering. By doing so, I hope to (1) identify which features of birdsong may be most subject to the complexity-efficiency trade-off, and (2) determine how clustering decisions affect the manifestation of linguistic laws in non-human communication systems. The data for this study come from a large corpus of house finch songs collected between 1975 and 2019 (Ju et al., 2019; Mundinger, 1975; Youngblood & Lahti, 2022). House finch song is an excellent model for these questions for several reasons. First, house finch song is socially learned (Mann et al., 2020) and culturally evolves (Mundinger, 1975), and thus should be subject to information compression. Second, male house finches are more likely to learn complex syllables (Youngblood & Lahti, 2022)—a content bias that may be an adaptation to female preferences for complexity. In house finches, males that sing longer songs at a faster rate are more attractive (Nolan & Hill, 2004) and have higher reproductive performance (Mennill et al., 2006), and courtship songs are longer and contain more syllable types (Ciaburri & Williams, 2019). Because these measures of complexity relate to production and learning costs, they may increase pressure for efficiency in other domains such as duration. Finally, house finch song is known to be subject to efficiency constraints. When house finches tutored by canaries reproduce the trills of their foster parents they are slower and much shorter (Mann et al., 2020). In the presence of urban noise, house finches increase the frequency of their vocalizations to minimize competition with the lower frequency sounds (Bermúdez-Cuamatzin et al., 2009; Bermúdez‐Cuamatzin et al., 2023; Fernández-Juricic et al., 2005), in spite of the fact that urban birds evolve longer bills for foraging that should lead to lower frequencies (Giraudeau et al., 2014).
Unless otherwise stated, all models were fit in STAN using the brms package in R (Bürkner, 2017), with 20,000 iterations across four MCMC chains. Model comparison was conducted using the widely-applicable information criterion (WAIC), a more robust alternative to AIC (Watanabe, 2021), and a Bayesian form of R2 (Gelman et al., 2019). Prior specifications, model diagnostics, and full model output for all analyses can be found in the Supplementary Information. I use the diagnostic heuristics recommended by Gelman et al. (2014): effective sample size \(> 100\), and \(1 < \hat{R} < 1.1\).
The recordings used in this study were collected in 1975 (Mundinger, 1975), 2012 (Ju et al., 2019), and 2019 (Youngblood & Lahti, 2022) in the New York metropolitan area (Brooklyn, Bronx, Queens, and Nassau County). Recordings from 1975 were collected with a Nagra III reel-to-reel tape recorder and a Sennheiser 804 shotgun microphone and converted to digital files (32 bit, 96 kHz) by the Cornell Lab of Ornithology in 2013 (Ju et al., 2019). Recordings from 2012 (16 bit, 44.1 kHz) and 2019 (32 bit, 48 kHz) were collected with a Marantz PD661 solid-state recorder and a Sennheiser ME66 shotgun microphone. The recordings from 1975 were downsampled to 48 kHz prior to analysis (Luscinia processes both 44.1 kHz and 48 kHz files). In all three years special precautions were taken to avoid recording the same bird twice Ju et al. (2019). Each site was visited only once. Within a site, only one individual was recorded within a 160 m radius until they stopped singing or flew away.
All songs were analyzed by Youngblood & Lahti (2022) using Luscinia, a database and analysis program developed specifically for birdsong (https://rflachlan.github.io/Luscinia/). Songs were analysed with a high-pass threshold of 2000 Hz, a maximum frequency of 9000 Hz and 5 dB of noise removal. 965 songs (26.2%) were excluded from the analysis due to high levels of noise or overlap with other bird species. Continuous traces with more than 20 ms between them were classified as syllables (Mundinger, 1975). Some syllables were further separable into elements, or traces with less than 20 ms between them. For this study, I used the mean frequency trace (mean frequency in each 1 ms bin) for each syllable in every song analyzed by Youngblood & Lahti (2022).
Figure 2.1 shows an example of the first ten syllables of a house finch song recorded by Mundinger (1975) and analyzed in Luscinia, where the blue line is the mean frequency over time.
Figure 2.1: The first 10 syllables from a song recorded by Mundinger (1975) and analysed in Luscinia. Each red bar corresponds to a syllable, and each green number corresponds to an element within that syllable. The blue traces represent mean frequency. In this song, syllable 3 and syllable 7 were classified as the same syllable type during dynamic time warping, and all other syllables are unique types. Reprinted from Youngblood & Lahti (2022).
The data used in this study comes directly from Youngblood and Lahti’s repository (Youngblood & Lahti, 2022).
#load data from repository for Youngblood & Lahti (2022)
load(url("https://github.com/masonyoungblood/TransmissionBias/raw/master/example/data.RData"))
Mean frequency was log-transformed prior to clustering (Lachlan et al., 2018). First, the normalized distances between all of the syllables were calculated via dynamic time warping with a window size of 10 (10% of the average signal length) using the dtwclust package in R (Sardá-Espinosa, 2019). A window size of 10% of the signal length is commonly used in speech processing research and seems to be a practical upper limit for many applications (Ratanamahatana & Keogh, 2005). Infinite distances (0.19% of cases) caused by comparisons of syllables with extreme signal length differences were assigned the maximum observed distance value. Next, hierarchical clustering and dynamic tree cut were used to cluster the syllables into types Roginek (2018). Hierarchical clustering was conducted with the UPGMA method implemented in the fastcluster package in R (Müllner, 2013), and dynamic tree cut was conducted with the dynamicTreeCut package in R (Langfelder et al., 2008).
For dynamic tree cut, I ran the hybrid algorithm with a minimum cluster size of 1 to maximize the representation of rare syllable types. The deep split parameter (\(DS\)) determines the granularity of clustering by controlling the value of two other parameters: the maximum core scatter (\(MCS\)), which controls the maximum within-group variation, and the minimum gap (\(MG\)), which controls the minimum between-group variation (Ju, 2016; Langfelder et al., 2008). \(DS = \{0, 1, 2, 3, 4\}\) correspond to \(MCS = \{0.64, 0.73, 0.82, 0.91, 0.95\}\), and \(MG = (1 - MCS)*0.75\). In their analyses of house finch song, Ju et al. (2019) and Roginek (2018) manually set \(MCS = 1\) and \(MG = 0.5\) while Youngblood & Lahti (2022) used \(DS = 3\) (corresponding to \(MCS = 0.91\) and \(MG = 0.0675\)), all of which led to a similar number of syllable types. I will follow Youngblood & Lahti (2022) in using the simpler deep split parameter to control granularity in clustering, as this approach is recommended by the creators of dynamic tree cut (Langfelder et al., 2008) and has been widely used for a variety of applications (Liu et al., 2022; Zhao et al., 2020) including vocal analysis (Burkett et al., 2015).
#load library
library(dtwclust)
#log transform each syllable individually
data$meanfreq <- sapply(1:nrow(data), function(x){log(data$meanfreq[[x]])})
#z-score normalization across entire dataset
m <- mean(unlist(data$meanfreq))
sd <- sd(unlist(data$meanfreq))
data$meanfreq <- sapply(1:nrow(data), function(x){(data$meanfreq[[x]]-m)/sd})
#set window size
window <- round(mean(lengths(data$meanfreq))*0.10) #10
#calculate distances
distances <- proxy::dist(data$meanfreq, method = "dtw_basic", window.size = window, normalize = TRUE)
#strip everything from distances
attr(distances, "dimnames") <- NULL
attr(distances, "method") <- NULL
attr(distances, "call") <- NULL
class(distances) <- "matrix"
#replace infinite distances with the maximum value
distances[which(distances == Inf)] <- max(distances[-which(distances == Inf)])
#convert to format compatible with hclust
dtw_dist <- stats::as.dist(distances)
#run clustering
clustering <- fastcluster::hclust(dtw_dist, method = "average")
#save clustering
save(clustering, file = "data_models/clustering.RData")
#run hybrid tree cut
hybrid_cut <- parallel::mclapply(0:4, function(x){dynamicTreeCut::cutreeDynamic(dendro = clustering, distM = distances, method = "hybrid", minClusterSize = 1, deepSplit = x)}, mc.cores = 5)
#save hybrid tree cut
save(hybrid_cut, file = "data_models/hybrid_cut.RData")
The results of clustering can be seen in Figure 2.2.
Figure 2.2: The results of hierarchical clustering. The colored bars below the dendrogram correspond to the categories assigned to each syllable when deep split is 2 (over-lumping), 3 (baseline), and 4 (over-splitting).
Once syllable types were identified by hierarchical clustering, I followed Ju et al. (2019) and Youngblood & Lahti (2022) in calculating the following syllable-level acoustic features for further analysis: average frequency (Hz), minimum frequency (Hz), maximum frequency (Hz), bandwidth (Hz), duration (ms), concavity (changes in the sign of the slope of the mean frequency trace/ms) and excursion (cumulative absolute change in Hz/ms). Concavity and excursion are both indicators of syllable complexity (Ju et al., 2019; Podos et al., 2016). Concavity was calculated after smoothing the mean frequency trace using a polynomial spline with a smoothing parameter of 5 (Youngblood & Lahti, 2022). These acoustic features were averaged across all of the observations of each syllable type.
#load data from clustering
load("data_models/clustering.RData")
load("data_models/hybrid_cut.RData")
#calculate concavity
inds <- 1:nrow(data)
inds <- inds[-which(lengths(data$meanfreq) < 6)] #remove things that are too short to analyze but have a concavity of 0
concavity <- rep(0, nrow(data))
for(i in inds){
temp <- pspline::sm.spline(data$meanfreq[[i]], spar = 5)
temp <- diff(c(temp$ysmth)) #extract first derivative (slopes)
concav <- 0
for(j in 2:length(temp)){
if((temp[[j]] < 0 & temp[[j-1]] > 0) | (temp[[j]] > 0 & temp[[j-1]] < 0)){
concavity[i] <- concavity[i] + 1
}
}
concavity[i] <- concavity[i]/length(temp)
}
#calculate excursion
excursion <- rep(0, nrow(data))
for(i in inds){
temp <- data$meanfreq[[i]]
excursion[i] <- sum(sapply(2:length(temp), function(x){abs(temp[x]-temp[x-1])}))/length(temp)
}
#add to data table
data$concavity <- concavity
data$excursion <- excursion
#calculate other measures from mean frequency trace
data$average <- sapply(1:nrow(data), function(x){mean(data$meanfreq[[x]])})
data$min <- sapply(1:nrow(data), function(x){min(data$meanfreq[[x]])})
data$max <- sapply(1:nrow(data), function(x){max(data$meanfreq[[x]])})
data$bandwidth <- sapply(1:nrow(data), function(x){max(data$meanfreq[[x]])-min(data$meanfreq[[x]])})
data$duration <- sapply(1:nrow(data), function(x){length(data$meanfreq[[x]])})
#add syllable types to data file
data$cluster_0 <- as.numeric(hybrid_cut[[1]])
data$cluster_1 <- as.numeric(hybrid_cut[[2]])
data$cluster_2 <- as.numeric(hybrid_cut[[3]])
data$cluster_3 <- as.numeric(hybrid_cut[[4]])
data$cluster_4 <- as.numeric(hybrid_cut[[5]])
#add year
data$year <- NA
data$year[which(substring(data$individual, 1, 4) == "1975")] <- 1
data$year[which(substring(data$individual, 1, 4) == "2012")] <- 2
data$year[which(substring(data$individual, 1, 4) == "2019")] <- 3
#add counts
syl_freqs_0 <- as.data.frame(table(data$cluster_0))
syl_freqs_1 <- as.data.frame(table(data$cluster_1))
syl_freqs_2 <- as.data.frame(table(data$cluster_2))
syl_freqs_3 <- as.data.frame(table(data$cluster_3))
syl_freqs_4 <- as.data.frame(table(data$cluster_4))
data$count_0 <- sapply(1:nrow(data), function(x){syl_freqs_0$Freq[which(syl_freqs_0$Var1 == data$cluster_0[x])]})
data$count_1 <- sapply(1:nrow(data), function(x){syl_freqs_1$Freq[which(syl_freqs_1$Var1 == data$cluster_1[x])]})
data$count_2 <- sapply(1:nrow(data), function(x){syl_freqs_2$Freq[which(syl_freqs_2$Var1 == data$cluster_2[x])]})
data$count_3 <- sapply(1:nrow(data), function(x){syl_freqs_3$Freq[which(syl_freqs_3$Var1 == data$cluster_3[x])]})
data$count_4 <- sapply(1:nrow(data), function(x){syl_freqs_4$Freq[which(syl_freqs_4$Var1 == data$cluster_4[x])]})
#add song lengths
song_freq_table <- as.data.frame(table(data$song))
data$song_length <- sapply(1:nrow(data), function(x){song_freq_table$Freq[which(song_freq_table$Var1 == data$song[x])]})
#create data frame of syllable types
syl_types <- lapply(0:4, function(x){
col_id <- which(names(data) == paste0("cluster_", x))
temp <- data.frame(type = sort(unique(as.data.frame(data)[, col_id])), count = NA, concavity = NA, excursion = NA, average = NA, min = NA, max = NA, bandwidth = NA, duration = NA)
#get average measures for each syllable type
for(i in 1:nrow(temp)){
inds <- which(as.data.frame(data)[, col_id] == temp$type[i])
temp$count[i] <- length(inds)
temp$concavity[i] <- median(data$concavity[inds])
temp$excursion[i] <- median(data$excursion[inds])
temp$average[i] <- median(data$average[inds])
temp$min[i] <- median(data$min[inds])
temp$max[i] <- median(data$max[inds])
temp$bandwidth[i] <- median(data$bandwidth[inds])
temp$duration[i] <- median(data$duration[inds])
}
return(temp)
})
#save processed version of data
processed_data <- list(data = data, syl_types = syl_types)
save(processed_data, file = "data_models/processed_data.RData")
Mandelbrot’s generalization of Zipf’s rank-frequency law takes the following form (Mandelbrot, 1953, 1962):
\[\begin{equation} f(r) = \frac{c}{(r + \beta)^\alpha} \tag{2.1} \end{equation}\]
where \(f(r)\) is the normalized frequency at each rank \(r\), and \(\alpha\) and \(\beta\) are parameters that control slope and convexity (respectively). According to Izsák (2006), the bounds of (2.1) are \(\alpha > 1\), \(\beta > -1\), and \(c > 1\). When \(\beta = 0\), this function simplifies to the original form of Zipf’s rank-frequency law: \(f(r) \propto 1/r^\alpha\). \(c\) is usually a normalization constant (Izsák, 2006; Mačutek, 2022; Mouillot & Lepretre, 2000) defined as:
\[\begin{equation} c = \sum_{i=1}^{\infty}\frac{1}{(r + \beta)^\alpha} \tag{2.2} \end{equation}\]
In practice, this form of Zipf’s rank-frequency law is notoriously difficult to fit to data due to strong correlations between \(\alpha\) and \(\beta\), which in turn determine \(c\) (Izsák, 2006; Mačutek, 2022; Mouillot & Lepretre, 2000). Here, I use a simplified version of (2.1) that treats \(c\) as a third parameter that is estimated alongside \(\alpha\) and \(\beta\) (Ausloos, 2014), as has been done in studies of chickadee calls (Ficken et al., 1994; Freeberg & Lucas, 2012; Hailman, 1994), which should be interpreted as an approximation of the Zipf-Mandelbrot distribution.
I fit (2.1) to the rank-frequency distributions of the syllable classifications from each level of deep split in two batches. First, I fit all three parameters to approximate Mandelbrot’s versions of the rank-frequency law. Then, I set \(\beta = 0\) to approximate Zipf’s original formulation.
#load library
library(brms)
#load processed data and rename
load("data_models/processed_data.RData")
data <- processed_data$data
syl_types <- processed_data$syl_types
rm(processed_data)
#get frequencies for three levels of deep split
freqs_a <- as.numeric(sort(table(data$cluster_2), decreasing = TRUE))/nrow(data)
freqs_b <- as.numeric(sort(table(data$cluster_3), decreasing = TRUE))/nrow(data)
freqs_c <- as.numeric(sort(table(data$cluster_4), decreasing = TRUE))/nrow(data)
data_a <- data.frame(freq = freqs_a, rank = 1:length(freqs_a))
data_b <- data.frame(freq = freqs_b, rank = 1:length(freqs_b))
data_c <- data.frame(freq = freqs_c, rank = 1:length(freqs_c))
#set priors
zipf_priors <- prior(normal(0, 10), lb = 1, nlpar = "a") +
prior(normal(0, 10), lb = 0, nlpar = "c")
mand_priors <- prior(normal(0, 10), lb = 1, nlpar = "a") +
prior(normal(0, 10), lb = -1, nlpar = "b") +
prior(normal(0, 10), lb = 0, nlpar = "c")
#fit the zipf models
zipf_fit_a <- brm(bf(freq ~ c/(rank^a), a + c ~ 1, nl = TRUE),
data = data_a, prior = zipf_priors,
iter = 5000, cores = 4)
zipf_fit_b <- brm(bf(freq ~ c/(rank^a), a + c ~ 1, nl = TRUE),
data = data_b, prior = zipf_priors,
iter = 5000, cores = 4)
zipf_fit_c <- brm(bf(freq ~ c/(rank^a), a + c ~ 1, nl = TRUE),
data = data_c, prior = zipf_priors,
iter = 5000, cores = 4)
#fit the mandelbrot models
mand_fit_a <- brm(bf(freq ~ c/((rank + b)^a), a + b + c ~ 1, nl = TRUE),
data = data_a, prior = mand_priors,
iter = 5000, cores = 4)
mand_fit_b <- brm(bf(freq ~ c/((rank + b)^a), a + b + c ~ 1, nl = TRUE),
data = data_b, prior = mand_priors,
iter = 5000, cores = 4)
mand_fit_c <- brm(bf(freq ~ c/((rank + b)^a), a + b + c ~ 1, nl = TRUE),
data = data_c, prior = mand_priors,
iter = 5000, cores = 4)
#store and save all relevant output
zipf_rf_models <- list(data = list(ds_2 = data_a,
ds_3 = data_b,
ds_4 = data_c),
zipf = list(ds_2 = summary(zipf_fit_a)$fixed,
ds_3 = summary(zipf_fit_b)$fixed,
ds_4 = summary(zipf_fit_c)$fixed),
mand = list(ds_2 = summary(mand_fit_a)$fixed,
ds_3 = summary(mand_fit_b)$fixed,
ds_4 = summary(mand_fit_c)$fixed),
prior = prior_summary(mand_fit_a),
waic = data.frame(zipf = c(waic(zipf_fit_a)$waic, waic(zipf_fit_b)$waic, waic(zipf_fit_c)$waic),
mand = c(waic(mand_fit_a)$waic, waic(mand_fit_b)$waic, waic(mand_fit_c)$waic)),
r2 = c(bayes_R2(zipf_fit_a)[1], bayes_R2(zipf_fit_b)[1], bayes_R2(zipf_fit_c)[1],
bayes_R2(mand_fit_a)[1], bayes_R2(mand_fit_b)[1], bayes_R2(mand_fit_c)[1]))
save(zipf_rf_models, file = "data_models/zipf_rf_models.RData")
Figure 2.3: The relationship between rank (x-axis) and count (y-axis) at each level of deep split (left, center, and right). The blue and orange lines denote the expected distributions according to Zipf’s rank-frequency law (blue) and Mandelbrot’s extension of it (orange).
As can be seen in Figure 2.3, the observed frequency distribution is consistent with the Zipf-Mandelbrot distribution at all three levels of granularity (\(R^2 = \{0.995, 0.995, 0.995\}\) at \(DS = \{2, 3, 4\}\)), but is only weakly consistent with the the original form of Zipf’s rank frequency law (\(R^2 = \{0.945, 0.777, 0.523\}\) at \(DS = \{2, 3, 4\}\)). \(R^2\) is the most commonly reported goodness-of-fit measure for both forms of Zipf’s rank-frequency law, and is generally \(> 0.8\) in studies of human language (Linders & Louwerse, 2023). The \(WAIC\) values and fitted parameters for all six models can be found in the Supplementary Information.
Zipf’s law of abbreviation predicts that common items will be shorter in duration than rarer ones. Rather than focusing duration alone, I explored whether the frequency of syllables is negatively correlated with four different measures of production cost: duration (ms), bandwidth (Hz), concavity (changes in the sign of the slope of the mean frequency trace/ms) and excursion (cumulative absolute change in Hz/ms). As mentioned in Clustering, concavity and excursion are both indicators of syllable complexity (Ju et al., 2019; Podos et al., 2016).
For each level of deep split and each measure of production cost, I constructed a log-normal model with the measure in question as the outcome variable, count as the predictor variable, and syllable type as a varying intercept. The alternative formulation, a Poisson model with count as the outcome variable, does not allow for the correct random effects structure (e.g. counts are identical across observations of the same syllable type). Zipf’s law of abbreviation predicts that there will be a strong effect of count on each measure of production cost.
#load packages
library(brms)
#load processed data and rename
load("data_models/processed_data.RData")
data <- processed_data$data
#set priors, a has lower bound on intercept (because outcome is only positive) while b does not
zipf_priors_a <- prior(normal(0, 4), lb = 0, class = "Intercept") +
prior(normal(0, 0.5), class = "b") +
prior(normal(0, 0.5), lb = 0, class = "sd") +
prior(normal(0, 0.5), lb = 0, class = "sigma")
zipf_priors_b <- prior(normal(0, 4), class = "Intercept") +
prior(normal(0, 0.5), class = "b") +
prior(normal(0, 0.5), lb = 0, class = "sd") +
prior(normal(0, 0.5), lb = 0, class = "sigma")
#run duration model across all three deep split values
duration_model_2 <- brm(log(duration) ~ scale(count_2) + (1|cluster_2), data = data, prior = zipf_priors_a, iter = 5000, cores = 4)
duration_model_3 <- brm(log(duration) ~ scale(count_3) + (1|cluster_3), data = data, prior = zipf_priors_a, iter = 5000, cores = 4)
duration_model_4 <- brm(log(duration) ~ scale(count_4) + (1|cluster_4), data = data, prior = zipf_priors_a, iter = 5000, cores = 4)
#run bandwidth model across all three deep split values
bandwidth_model_2 <- brm(log(bandwidth) ~ scale(count_2) + (1|cluster_2), data = data, prior = zipf_priors_a, iter = 5000, cores = 4)
bandwidth_model_3 <- brm(log(bandwidth) ~ scale(count_3) + (1|cluster_3), data = data, prior = zipf_priors_a, iter = 5000, cores = 4)
bandwidth_model_4 <- brm(log(bandwidth) ~ scale(count_4) + (1|cluster_4), data = data, prior = zipf_priors_a, iter = 5000, cores = 4)
#run concavity model across all three deep split values, with 0.0001 added to avoid log-transformation issues
concavity_model_2 <- brm(log(concavity + 0.0001) ~ scale(count_2) + (1|cluster_2), data = data, prior = zipf_priors_b, iter = 5000, cores = 4)
concavity_model_3 <- brm(log(concavity + 0.0001) ~ scale(count_3) + (1|cluster_3), data = data, prior = zipf_priors_b, iter = 5000, cores = 4)
concavity_model_4 <- brm(log(concavity + 0.0001) ~ scale(count_4) + (1|cluster_4), data = data, prior = zipf_priors_b, iter = 5000, cores = 4)
#run excursion model across all three deep split values, with 0.0001 added to avoid log-transformation issues
excursion_model_2 <- brm(log(excursion + 0.0001) ~ scale(count_2) + (1|cluster_2), data = data, prior = zipf_priors_b, iter = 5000, cores = 4)
excursion_model_3 <- brm(log(excursion + 0.0001) ~ scale(count_3) + (1|cluster_3), data = data, prior = zipf_priors_b, iter = 5000, cores = 4)
excursion_model_4 <- brm(log(excursion + 0.0001) ~ scale(count_4) + (1|cluster_4), data = data, prior = zipf_priors_b, iter = 5000, cores = 4)
#combine into object that contains model estimates, diagnostics, and prior specifications, and then save
zipf_models <- list(duration = list(summary(duration_model_2)$fixed, summary(duration_model_3)$fixed, summary(duration_model_4)$fixed, prior_summary(duration_model_2)),
bandwidth = list(summary(bandwidth_model_2)$fixed, summary(bandwidth_model_3)$fixed, summary(bandwidth_model_4)$fixed, prior_summary(bandwidth_model_2)),
concavity = list(summary(concavity_model_2)$fixed, summary(concavity_model_3)$fixed, summary(concavity_model_4)$fixed, prior_summary(concavity_model_2)),
excursion = list(summary(excursion_model_2)$fixed, summary(excursion_model_3)$fixed, summary(excursion_model_4)$fixed, prior_summary(excursion_model_2)))
save(zipf_models, file = "data_models/zipf_models.RData")
Model | DS | Est. | Err. | 2.5% | 97.5% |
|
|---|---|---|---|---|---|---|
duration ~ count | 2 | -0.36 | 0.14 | -0.63 | -0.10 | * |
3 | -0.46 | 0.06 | -0.58 | -0.34 | * | |
4 | -0.42 | 0.03 | -0.48 | -0.35 | * | |
bandwidth ~ count | 2 | -0.55 | 0.11 | -0.75 | -0.34 | * |
3 | -0.68 | 0.06 | -0.79 | -0.57 | * | |
4 | -0.64 | 0.03 | -0.69 | -0.58 | * | |
concavity ~ count | 2 | -0.02 | 0.10 | -0.21 | 0.17 | |
3 | -0.06 | 0.04 | -0.15 | 0.02 | ||
4 | -0.07 | 0.03 | -0.12 | -0.02 | * | |
excursion ~ count | 2 | -0.26 | 0.07 | -0.41 | -0.11 | * |
3 | -0.33 | 0.04 | -0.41 | -0.25 | * | |
4 | -0.32 | 0.02 | -0.36 | -0.27 | * |
The results of the log-normal models can be seen in Table 2.1. Duration, bandwidth, and excursion had strong negative effects on count at all three levels of granularity. Concavity had no effect on count at the first two levels of granularity, but had a very weak negative effect at the third level of granularity.
Figure 2.4: The relationship between four measures of production cost (x-axis) and count (y-axis) for each level of deep split (left, center, right). Each point shows the median value for a syllable type, so the orange best fit lines are from a simple Poisson model (count ~ cost) rather than the full log-normal model.
These patterns are apparent in the plots of median production costs and counts in Figure 2.4. The lack of effect for concavity may be due in part to the weaker right-skew.
Menzerath’s law predicts that longer sequences will be comprised of smaller items. Importantly, Menzerath’s law is sometimes detected in random sequences from null models (Ferrer-i-Cancho et al., 2014; G. Torre et al., 2021; Tanaka-Ishii, 2021). I think that there are two sorts of null models that make sense in this context: (1) random sequences with the same number of syllables as real songs, and (2) pseudorandom sequences that match both the number of syllables and cumulative duration in time. James et al. (2021) interpret the latter as approximating simple motor constraints—Menzerath’s law resulting from efficiency in production alone—where stronger effects in the real data would indicate additional mechanisms (e.g. communicative efficiency). In this study, I compare the real data against each of these to assess both the presence of Menzerath’s law and whether it is beyond what would be expected from production constraints.
The production constraint model of James et al. (2021) works as follows. For each iteration of the model, a pseudorandom sequence was produced for each real song in the dataset. Syllables were randomly sampled (with replacement) from the population until the duration of the random sequence exceeded the duration of the real song. If the difference between the duration of the random sequence and the real song was <50% of the duration of the final syllable, then the final syllable was kept in the sequence. Otherwise, it was removed. Each iteration of the model produces a set of random sequences with approximately the same distribution of durations as the real data.
To estimate the strength of Menzerath’s law, I constructed a log-normal model with syllable duration as the outcome variable, song length in number of syllables as the predictor variable, and the song as a varying intercept (Safryghin et al., 2022). The production constraint model removes variation accounted for by the year and individual bird, so song is the only varying intercept that is appropriate for comparison. Syllable durations do not differ across years. This model was used to estimate the strength of the effect of song length on syllable duration in both the real data and 10 simulated datasets from both null models. The brm_multiple function from the brms package in R was used to fit a single model to the the 10 simulated datasets from each null model and produce a combined posterior distribution (Bürkner, 2017).
This analysis differs from James et al. (2021) in two key ways: (1) I use the actual duration of syllables rather than a single median value for each song, and (2) I compare the full posterior distributions rather than point estimates of effects. These decisions should yield more conservative conclusions about the differences between the real and simulated data. Note that syllable type is also not incorporated into the modeling, so this is the only analysis that is not conducted across multiple levels of deep split.
#load packages
library(brms)
#load processed data
load("data_models/processed_data.RData")
data <- processed_data$data
syl_types <- processed_data$syl_types
rm(processed_data)
#get unique songs and durations
unique_songs <- unique(data$song)
individuals <- as.character(sapply(unique_songs, function(q){data$individual[which(data$song == q)][1]}))
song_durs <- as.numeric(sapply(unique_songs, function(x){sum(data$duration[which(data$song == x)])}))
#create 10 simulated datasets from the simple null model
simple_null_sims <- lapply(1:10, function(x){
temp <- data
temp$duration <- sample(temp$duration)
return(temp)
})
#create production constraint model based on james et al. (2021)
prod_null <- function(){
#get shuffled versions of data
sampled_inds <- sample(nrow(data))
sampled_durs <- data$duration[sampled_inds]
#double everything so the algorithm can loop back around if it needs to
sampled_inds <- c(sampled_inds, sampled_inds)
sampled_durs <- c(sampled_durs, sampled_durs)
#set starting points and sampling window for cumulative duration calculation
ind <- 1
m <- 1
window <- 200
#create empty sequences list to fill
sequences <- list()
#iterate through unique songs and create sequences
while(m <= length(unique_songs)){
#get cumulative durations from ind to the end of the sampling window
cum_durs <- cumsum(sampled_durs[ind:(ind + window)])
#set the targets as the value at which the cumulative durations surpass the desired duration
rel_position <- min(which(cum_durs > song_durs[m])) #relative position in the cumulative duration window
abs_position <- ind + rel_position - 1 #absolute position in the sampled vectors
#if syllable duration surpasses song duration at first position, do exception handling
if(rel_position == 1){
#add the single-unit sequence and iterate ind
sequences[[m]] <- data.frame(song = unique_songs[m], individual = individuals[m], duration = sampled_durs[ind], song_length = rel_position)
ind <- ind + 1
} else{
#if the difference between the cumulative duration at the target and the desired duration is less that half of the length of target
if(cum_durs[rel_position] - song_durs[m] < sampled_durs[abs_position]/2){
#add the sequence and iterate ind
sequences[[m]] <- data.frame(song = unique_songs[m], individual = individuals[m], duration = sampled_durs[ind:abs_position], song_length = rel_position)
ind <- abs_position + 1
} else{
#otherwise, add abbreviated sequence and iterate ind
sequences[[m]] <- data.frame(song = unique_songs[m], individual = individuals[m], duration = sampled_durs[ind:(abs_position - 1)], song_length = rel_position - 1)
ind <- abs_position
}
}
#iterate m
m <- m + 1
}
#return combined sequences
return(do.call(rbind, sequences))
}
#create 10 simulated datasets from the production constraint model
prod_null_sims <- lapply(1:10, function(x){prod_null()})
#set priors
menz_priors <- prior(normal(0, 3), lb = 0, class = "Intercept") +
prior(normal(0, 0.1), class = "b") +
prior(normal(0, 0.5), lb = 0, class = "sd") +
prior(normal(0, 0.5), lb = 0, class = "sigma")
#run the actual model on the data
model <- brm(log(duration) ~ scale(song_length) + (1|song), data = data, prior = menz_priors, iter = 5000, cores = 4)
#fit a single model across the 10 simple null datasets
simple_null_models <- brm_multiple(log(duration) ~ scale(song_length) + (1|song), data = simple_null_sims, prior = menz_priors, iter = 5000, cores = 7)
#fit a single model across the 10 production constraint datasets
prod_null_models <- brm_multiple(log(duration) ~ scale(song_length) + (1|song), data = prod_null_sims, prior = menz_priors, iter = 5000, cores = 7)
#combine into object that contains posterior samples, model estimates and diagnostics, and prior specifications, and then save
menz_models <- list(actual = list(estimates = summary(model)$fixed, samples = posterior_samples(model, pars = c("scalesong_length"))$b_scalesong_length),
simple_null = list(estimates = summary(simple_null_models)$fixed, samples = posterior_samples(simple_null_models, pars = c("scalesong_length"))$b_scalesong_length, rhats = simple_null_models$rhats),
prod_null = list(estimates = summary(prod_null_models)$fixed, samples = posterior_samples(prod_null_models, pars = c("scalesong_length"))$b_scalesong_length, rhats = prod_null_models$rhats),
prior = prior_summary(model))
save(menz_models, file = "data_models/menz_models.RData")
Figure 2.5: The left panel shows the relationship between song length in number of syllables (x-axis) and syllable duration (y-axis), with a best fit line from the fitted lognormal model. The right panel shows the posterior distribution of the effect of song length on duration for the real data (orange) compared to 10 simulated datasets from the simple (blue) and production (green) null models. The combined posteriors for the simulated datasets are based on 2,500 posterior samples from each of the 10 models.
The results of the log-normal model indicate that song length has a negative effect on syllable duration (Mean Estimate: -0.052; 95% CI: [-0.066, -0.038]) (left panel of Figure 2.5). The posterior distribution for this effect from the model fit to the actual data (orange) is more negative than the posterior distributions from 10 simulated datasets from the production constraint null model of James et al. (2021) (green) and a simple random null model (blue), although there is notable overlap between the lower tail of the production constraing model and the upper tail of the actual data (right panel of Figure 2.5).
The small-worldness index (\(SWI\)) is a ratio based on the clustering coefficient (\(C\)) and average path length (\(L\)) (Humphries & Gurney, 2008):
\[\begin{equation} SWI = \frac{C/C_{rand}}{L/L_{rand}} \tag{2.3} \end{equation}\]
where \(C_{rand}\) and \(L_{rand}\) are calculated from random networks of the same size as the real network. Values of \(SWI > 1\) are consistent with small-world structure (Humphries & Gurney, 2008), which is thought to reflect general systematic structure and thus compressibility (Allen et al., 2019; Raviv et al., 2021).
In this study, I followed Allen et al. (2019) in calculating \(SWI\) from the unweighted directed network of all syllable transitions in the population (lower panel of Figure 2.6A) to allow comparison with previous studies of non-human song. Note that I am using the abbreviation \(SWI\) rather than the more commonly used \(S\) to differentiate it from entropy (\(\hat{S}\), see Mutual Information).
Importantly, if the frequency distribution of syllable types is very skewed, then random sequences could exhibit small-world structure simply because of clustering around common types. To avoid this confound, \(SWI\) was calculated 1,000 times from the real data and 1,000 times from random sequences with the same distribution of types. Variation in observed \(SWI\) is due to \(C_{rand}\) and \(L_{rand}\) being computed from different random networks. It is also possible to compute S with a weighted undirected network (Muldoon et al., 2016), but I opted for the unweighted form to allow for comparison with other whale and birdsong studies.
#load library
library(igraph)
#load processed data
load("data_models/processed_data.RData")
data <- processed_data$data
syl_types <- processed_data$syl_types
rm(processed_data)
#get number of unique songs
unique_songs <- unique(data$song)
#set number of iterations
iters <- 1000
#compute swi at each value of deep split
small_world <- parallel::mclapply(0:4, function(ds){
#get column ID corresponding to deep split value
if(ds == 0){syls <- data$cluster_0}
if(ds == 1){syls <- data$cluster_1}
if(ds == 2){syls <- data$cluster_2}
if(ds == 3){syls <- data$cluster_3}
if(ds == 4){syls <- data$cluster_4}
#extract all of the structured directed sequences
directed_sequences <- do.call(rbind, lapply(1:length(unique_songs), function(i){
sequence <- syls[which(data$song == unique_songs[i])]
return(matrix(c(sequence[1:(length(sequence)-1)], sequence[2:length(sequence)]), ncol = 2))
}))
#convert to factor
directed_sequences <- matrix(as.numeric(factor(directed_sequences)), ncol = 2)
#extract "iters" sets of shuffled sequences for null models
shuffled_sequences <- lapply(1:iters, function(x){
shuffled_syls <- sample(syls)
do.call(rbind, lapply(1:length(unique_songs), function(i){
sequence <- shuffled_syls[which(data$song == unique_songs[i])]
return(matrix(c(sequence[1:(length(sequence)-1)], sequence[2:length(sequence)]), ncol = 2))
}))
})
#construct structured graph, ignoring edge weights
graph <- simplify(graph_from_edgelist(unique(directed_sequences), directed = TRUE))
#get small world indices for it
structured_vals <- sapply(1:iters, function(x){
temp <- sample_gnm(n = length(V(graph)), m = length(E(graph)), directed = TRUE)
(transitivity(graph)/transitivity(temp))/(mean_distance(graph)/mean_distance(temp))
})
#get small world indices for null models
null_vals <- sapply(1:iters, function(x){
target_sequence <- matrix(as.numeric(factor(shuffled_sequences[[x]])), ncol = 2)
null <- simplify(graph_from_edgelist(unique(target_sequence), directed = TRUE))
temp <- sample_gnm(n = length(V(null)), m = length(E(null)), directed = TRUE)
(transitivity(null)/transitivity(temp))/(mean_distance(null)/mean_distance(temp))
})
#return objects
return(list(structured = structured_vals, null = null_vals))
}, mc.cores = 5)
#save indices
save(small_world, file = "data_models/small_world.RData")
Figure 2.6: The left panel shows transition network between syllables types for a single male (#22 from 1975) above the global transition network for the entire dataset. The right panel shows the estimated small-worldness index calculated 1,000 times from the real data (blue) compared to pseudorandom sequences with the same frequency distribution of syllable types (orange). The dashed vertical line at 1 corresponds to the standard threshold for a network having small-world structure (Humphries & Gurney, 2008).
Figure 2.6B shows the distribution of \(SWI\) calculated from random sequences with the same distribution of types (orange) and the actual sequences (blue). At all three levels of granularity, the observed small-worldness of the real songs is above both the standard threshold (\(SWI > 1\)) and the pseudorandom sequences.
The rate at which the mutual information between items decays with distance reflects whether underlying syntactic structure is Markovian, hierarchical, or a composite of the two (Sainburg et al., 2019). Mutual information (\(MI\)) was calculated from pairs of sequential syllables separated by a certain distance, using the method of Sainburg et al. (2019). Sainburg et al. (2019) concatenated all songs recorded in a particular day before calculating \(MI\). In our case, each day of recording represents several individuals recorded in several locations so concatenating sequences within days would lead to spurious pairs. Instead, I compute \(MI\) on concatenated sequences of songs from each individual. For example, if there are two individuals with two concatenated sequences of songs, \(a \to b \to c \to d\) and \(e \to f \to g\), and the target distance is 2, then the pairs used for \(MI\) calculated would be \(((a, c), (b, d), (e, g))\). \(MI\) would then be calculated using:
\[\begin{equation} \hat{I}(X, Y) = \hat{S}(X) + \hat{S}(Y) - \hat{S}(X, Y) \tag{2.4} \end{equation}\]
where \(\hat{I}\) denotes information and \(\hat{S}\) denotes entropy. \(X\) is the distribution of first syllables \((a, b, e)\), \(Y\) is the distribution of second syllables \((c, d, g)\), and \(XY\) is the joint distribution of pairs \(((a, c), (b, d), (e, g))\). \(\hat{S}\) was calculated using the method of Grassberger (2008) and Grassberger (2022) used by Lin & Tegmark (2017):
\[\begin{equation} \hat{S} = ln(N) - \frac{1}{N} \sum_{i=1}^{K} N_{i} \psi(N_{i}) \tag{2.5} \end{equation}\]
where \(N\) is the total number of tokens in the distribution, \(N_i\) is the number of tokens within each type \(i\) out of the total number of types \(K\), and \(\psi\) is the digamma function. The actual \(MI\) is then measured using:
\[\begin{equation} MI = \hat{I} - \hat{I}_{sh} \tag{2.6} \end{equation}\]
where \(\hat{I}_{sh}\) is the estimated lower bound of \(MI\) calculated from shuffled sequences, created by randomly permuting the concatenated sequences of individuals’ songs.
Sainburg et al. (2019) and Sainburg et al. (2022) simulated data from the hierarchical model of Lin & Tegmark (2017), the Markov model of Katahira et al. (2011), and their composite model in which Markov chains are nested within a larger hierarchical structure.
To determine whether the observed \(MI\) decay was consistent with a Markov process, hierarchical process, or both, I fit the following three decay models to the data:
\[\begin{equation} \textrm{exponential decay: } y = ae^{-xb} \tag{2.7} \end{equation}\] \[\begin{equation} \textrm{power-law decay: } y = cx^{d} \tag{2.8} \end{equation}\] \[\begin{equation} \textrm{composite decay: } y = ae^{-xb} + cx^{d} \tag{2.9} \end{equation}\]
where \(y\) is the estimated \(MI\) and \(x\) is the distance between syllables (Sainburg et al., 2019; Sainburg et al., 2022). Sainburg et al. (2019) included an intercept \(f\) in all three models, but I removed it because it led to overfitting issues (e.g. exponential model with an extra “knee” after the initial decay), did not significantly improve model fit (\(\Delta WAIC < 2\)), and was not included in Lin & Tegmark (2017). I focused my analysis on distances of up to 100 syllables to enable easy comparison with Sainburg et al. (2019). I followed Sainburg et al. (2019) in comparing the fit of each model at increasing distances from 100 to 1,200 (the longest individual sequence is 1,219), and found that the composite model outperforms both the exponential and power-law models at all distances (see Supplementary Information). Note that \(MI\) decay becomes noisy and begins to slowly increase past ~400 syllables. This is because only a small number of individuals have concatenated sequences longer than that (~4%), and \(MI\) is known to be systematically biased upwards at low sample sizes (Treves & Panzeri, 1995).
#load packages
library(brms)
#load data
load("data_models/processed_data.RData")
data <- processed_data$data
data$year <- strtrim(data$individual, 4)
#create function for the grassberger method of entropy estimation used by lin and tegmark (log is ln by default, which is what grassberger used originally)
s_calc <- function(vals){return(log(sum(vals)) - ((1/sum(vals))*sum(vals*digamma(vals))))}
#get unique songs and song lengths
unique_songs <- unique(data$song)
song_lengths <- as.numeric(sapply(unique_songs, function(x){length(which(data$song == x))}))
#get unique individuals and the lengths of their sequences
unique_inds <- unique(data$individual)
inds_lengths <- as.numeric(sapply(unique_inds, function(x){length(which(data$individual == x))}))
#set random seed
set.seed(12345)
#calculate mutual information across three levels of deep split
mi_data <- parallel::mclapply(1:3, function(i){
#use appropriate syllables
if(i == 1){data$syls <- data$cluster_2}
if(i == 2){data$syls <- data$cluster_3}
if(i == 3){data$syls <- data$cluster_4}
#compute actual mutual information curves
actual <- sapply(1:(max(inds_lengths) - 1), function(m){
#set m as the distance between syllables in the sequence
dist <- m
#store pairs within bouts
pairs <- do.call(rbind, lapply(unique_inds[which(inds_lengths > m)], function(r){
seq <- data$syls[which(data$individual == r)]
t(sapply(1:(length(seq)-dist), function(x){c(seq[x], seq[x+dist])}))
}))
#get frequencies of syllables in each column
n_x <- as.numeric(table(pairs[, 1]))
n_y <- as.numeric(table(pairs[, 2]))
#get frequencies of pairs in the data
pairs <- pairs[order(pairs[, 1], pairs[, 2]), ]
index <- !duplicated(pairs)
pair_freqs <- as.numeric(tapply(index, cumsum(index), length))
#calculate s for each
s_x <- s_calc(n_x)
s_y <- s_calc(n_y)
s_xy <- s_calc(pair_freqs)
#compute mutual information and return
return(s_x + s_y - s_xy)
})
#compute shuffled mutual information curves
shuffled <- sapply(1:(max(inds_lengths) - 1), function(m){
#set m as the distance between syllables in the sequence
dist <- m
#store pairs within bouts
pairs <- do.call(rbind, lapply(unique_inds[which(inds_lengths > m)], function(r){
seq <- sample(data$syls[which(data$individual == r)])
t(sapply(1:(length(seq)-dist), function(x){c(seq[x], seq[x+dist])}))
}))
#get frequencies of syllables in each column
n_x <- as.numeric(table(pairs[, 1]))
n_y <- as.numeric(table(pairs[, 2]))
#get frequencies of pairs in the data
pairs <- pairs[order(pairs[, 1], pairs[, 2]), ]
index <- !duplicated(pairs)
pair_freqs <- as.numeric(tapply(index, cumsum(index), length))
#calculate s for each
s_x <- s_calc(n_x)
s_y <- s_calc(n_y)
s_xy <- s_calc(pair_freqs)
#compute mutual information and return
return(s_x + s_y - s_xy)
})
#return mutual information decay, with shuffled baseline substracted from actual information
return(data.frame(dist = 1:length(actual), mi = c(actual - shuffled), mi_raw = actual, mi_base = shuffled))
}, mc.cores = 3)
#save output
save(mi_data, file = "data_models/mi_data.RData")
#load packages
library(brms)
#load data
load("data_models/mi_data.RData")
#set priors
exponential_priors <- prior(normal(0, 1), lb = 0, nlpar = "a") +
prior(normal(0, 1), lb = 0, nlpar = "b")
power_law_priors <- prior(normal(0, 1), lb = 0, nlpar = "c") +
prior(normal(0, 1), lb = 0, nlpar = "d")
composite_priors <- prior(normal(0, 1), lb = 0, nlpar = "a") +
prior(normal(0, 1), lb = 0, nlpar = "b") +
prior(normal(0, 1), lb = 0, nlpar = "c") +
prior(normal(0, 1), lb = 0, nlpar = "d")
#find the point at which the distances are no longer best fit by a composite model for each level of deep split
increments <- c(100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1200)
waic_test <- lapply(1:3, function(h){
parallel::mclapply(increments, function(x){
exponential_test <- brm(bf(mi ~ a*exp((-dist)*b), a + b ~ 1, nl = TRUE),
data = mi_data[[h]][1:x, ], prior = exponential_priors, iter = 5000)
power_law_test <- brm(bf(mi ~ c*(dist^(-d)), c + d ~ 1, nl = TRUE),
data = mi_data[[h]][1:x, ], prior = power_law_priors, iter = 5000)
composite_test <- brm(bf(mi ~ a*exp((-dist)*b) + c*(dist^(-d)), a + b + c + d ~ 1, nl = TRUE),
data = mi_data[[h]][1:x, ], prior = composite_priors, iter = 5000)
waic_temp <- WAIC(exponential_test, power_law_test, composite_test)
return(c(waic_temp$loos$exponential_test$waic,
waic_temp$loos$power_law_test$waic,
waic_temp$loos$composite_test$waic))
}, mc.cores = 7)
})
#format and save
waic_test <- lapply(1:3, function(x){
temp <- data.frame(do.call(rbind, waic_test[[x]]))
colnames(temp) <- c("exponential", "power_law", "composite")
rownames(temp) <- c(100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1200)
return(temp)
})
save(waic_test, file = "data_models/waic_test.RData")
#run models at all three levels of deep split
mi_models <- lapply(1:3, function(g){
#fit the exponential model
exponential_fit <- brm(bf(mi ~ a*exp((-dist)*b), a + b ~ 1, nl = TRUE),
data = mi_data[[g]][1:100, ], prior = exponential_priors,
iter = 5000, cores = 4)
#fit the power law model
power_law_fit <- brm(bf(mi ~ c*(dist^(-d)), c + d ~ 1, nl = TRUE),
data = mi_data[[g]][1:100, ], prior = power_law_priors,
iter = 5000, cores = 4)
#fit the composite model
composite_fit <- brm(bf(mi ~ a*exp((-dist)*b) + c*(dist^(-d)), a + b + c + d ~ 1, nl = TRUE),
data = mi_data[[g]][1:100, ], prior = composite_priors,
iter = 5000, cores = 4)
#calculate r2
r2 <- rbind(bayes_R2(exponential_fit), bayes_R2(power_law_fit), bayes_R2(composite_fit))
rownames(r2) <- c("exponential", "power_law", "composite")
#compute aic
waic <- WAIC(exponential_fit, power_law_fit, composite_fit)
waic <- c(waic$loos$exponential_fit$estimates[3, 1], waic$loos$power_law_fit$estimates[3, 1], waic$loos$composite_fit$estimates[3, 1])
#return data and model fits and aic
return(list(waic = waic,
r2 = r2,
priors = prior_summary(composite_fit),
exponential_fit = summary(exponential_fit)$fixed,
power_law_fit = summary(power_law_fit)$fixed,
composite_fit = summary(composite_fit)$fixed))
})
#save data and models
save(mi_models, file = "data_models/mi_models.RData")
DS | Model | WAIC | R-Sq |
|---|---|---|---|
2 | Exponential | -829 | 0.951 |
Power-Law | -766 | 0.910 | |
Composite | -840 | 0.952 | |
3 | Exponential | -617 | 0.982 |
Power-Law | -487 | 0.927 | |
Composite | -668 | 0.986 | |
4 | Exponential | -599 | 0.990 |
Power-Law | -432 | 0.926 | |
Composite | -688 | 0.993 |
#extract simulated values from sainburg et al.
model_fig <- xml2::read_xml("https://raw.githubusercontent.com/timsainb/ParallelsBirdsongLanguagePaper/master/figures/modelfig.svg")
xml2::xml_ns_strip(model_fig)
#convert raw values to actual values
raw_to_log <- function(y, range = c(281.188594, 6.838594), axis = c(10e-4, 10e1)){return(exp(scales::rescale(x = -c(range, y), from = -range, to = log(axis)))[-c(1:2)])}
#get heirarchical values
y_heirarchy <- as.numeric(xml2::xml_attr(xml2::xml_find_all(model_fig, xpath = "//*[@id='PathCollection_1']/g/use"), "y"))
y_heirarchy <- raw_to_log(y_heirarchy)
#get markov values
y_markov_a <- as.numeric(xml2::xml_attr(xml2::xml_find_all(model_fig, xpath = "//*[@id='PathCollection_2']/g/use"), "y"))[1:29]
y_markov_b <- as.numeric(xml2::xml_attr(xml2::xml_find_all(model_fig, xpath = "//*[@id='PathCollection_3']/g/use"), "y"))[1:12]
y_markov_a <- raw_to_log(y_markov_a)
y_markov_b <- raw_to_log(y_markov_b)
#get combined values
y_combined <- as.numeric(xml2::xml_attr(xml2::xml_find_all(model_fig, xpath = "//*[@id='PathCollection_4']/g/use"), "y"))
y_combined <- raw_to_log(y_combined)
#combine all simulated values into one data frame
mi_simulated <- data.frame(x = c(1:100, 1:12, 1:100),
y = c(y_heirarchy, y_markov_b, y_combined),
group = c(rep("heirarchy", 100), rep("markov", 12), rep("combined", 100)))
#save simulated values
save(mi_simulated, file = "data_models/mi_simulated.RData")
Figure 2.7: The top panel shows simulated mutual information decay curves from the Lin & Tegmark (2017) hierarchical model (left), the Katahira et al. (2011) Markov model (center), and the Sainburg et al. (2019) composite model (right) (data and inspiration for diagrams from Sainburg et al., 2019). The bottom panel shows the computed mutual information decay curves for the observed data at different levels of deep split (left, center, right). The solid line corresponds to the full composite model, while the dashed and dotted lines correspond to the exponential and power-law terms, respectively.
At all three levels of granularity, the composite model has a lower \(WAIC\) and higher \(R^2\) than both the exponential and power-law models (Table 2.2 and Figure 2.5), suggesting that mutual information decay in house finch song is more consistent with a combination of Markovian and hierarchical processes rather than with one or the other.
All three linguistic laws considered here are present in house finch song. Three out of the four measures of production cost, most importantly duration, are consistently and strongly negatively correlated with frequency, providing robust evidence for Zipf’s law of abbreviation. Menzerath’s law also found solid support, with a steeper negative relationship between song length and syllable duration than predicted by a model of production constraints (James et al., 2021). Together, these results show clear evidence for efficiency—syllables that are difficult to produce are less common and more likely to appear in shorter songs. Mandelbrot’s form of Zipf’s rank-frequency law provided a good fit to the data and has a more convex shape than the original form, which is consistent with studies of other non-human communication systems that have more redundancy than human language (Allen et al., 2019; Briefer et al., 2010; Cody et al., 2016; Hailman et al., 1985; Martins, 1994; McCowan et al., 1999). I will limit my interpretation of this result, given ongoing debates about the cause of the rank-frequency law, but it is notable that the goodness-of-fit of the Zipf-Mandelbrot distribution to house finch song (\(R^2 > 0.99\)) is within the range of human language (Linders & Louwerse, 2023; Piantadosi, 2014).
The two structural properties of language considered here are also found in house finch song. First, the syllable network has a small-world structure, characterized by high levels of clustering and low average path lengths, which is thought to reflect systematic structure and efficient recall in human language (Cancho & Solé, 2001; Motter et al., 2002; Steyvers & Tenenbaum, 2005). The small-worldness index of house finch song is within the range seen in humpback whales (Allen et al., 2019) and other songbirds at the first two levels of granularity (Beckage et al., 2011; Cody et al., 2016; Hedley, 2016; Sasahara et al., 2012; Weiss et al., 2014) (\(SWI \sim 1.69-4.7\)), but is much higher when syllables are over-split into types (\(SWI \sim 10-11\)). Second, the decay in mutual information between syllables with increasing distance has a similar shape to that found in several human languages and songbird species (Sainburg et al., 2019), which is associated with Markov chains nested within a larger hierarchical structure. Historically, birdsong sequences were assumed to follow simple Markov chains (Berwick et al., 2011; Katahira et al., 2011; Leonardo & Konishi, 1999). This result lends support to an emerging consensus that song sequences are actually much more complex, containing long-range dependencies and hierarchical processes that parallel human language (Kershenbaum et al., 2014; Morita et al., 2021; Sainburg et al., 2019; Searcy et al., 2022). In combination, the small-worldness and mutual information decay of house finch song sequences suggest that they exhibit the kind of systematic structure that is thought to maximize expressivity while reducing learning costs (Allen et al., 2019; Kirby et al., 2015; Raviv et al., 2021), making it easier for more information to “pass through the bottleneck” of social learning (Gruber et al., 2022).
Notably, these patterns are consistently found across three levels of granularity in syllable clustering, pointing to a limited level of scale invariance in house finch song. Scale invariance refers to self-similarity in organization across levels of analysis, and is hypothesized to be a key indicator of complex systems (Stanley et al., 2000) including language and speech (Gervain & Geffen, 2019; Mehri & Jamaati, 2021), behavior (Bialek & Shaevitz, 2023), and cognition (Chater & Brown, 1999). I use the term “limited”, though, because here it is consistency across syllable classifications rather than across levels of hierarchy (e.g. phrases, songs, bouts). I have heard others studying the cultural evolution of birdsong refer to this idea as “fractal equivalency”—different resolutions of clustering should show similar statistical signals of organization (D. C. Lahti, personal communication, 2020). A practical conclusion of this finding is that information theoretic measures may not be reliable indicators of clustering quality in non-human communication systems (Kershenbaum et al., 2016), although this should be explored in other species.
A long-standing critique of Zipf’s laws is that they may be statistical artifacts of other processes (Caplan et al., 2020), starting with Miller’s observation that randomly typing on keyboards can produce similar patterns (Miller, 1957). That being said, random typing accounts are not realistic causal descriptions of how communication systems emerge, and there are good empirical reasons to doubt that they undermine efficiency accounts (Piantadosi, 2014). Randomly-generated texts produce rank-frequency distributions that differ from those in real corpora (Ferrer-i-Cancho & Elvevåg, 2010), random typing models are not truly neutral as they can be mathematically reframed as minimizing costs (Ferrer-i-Cancho, 2016; Ferrer-i-Cancho et al., 2022), and there is direct experimental evidence that Zipfian abbreviation emerges from pressure for efficient communication (Kanwal et al., 2017). In my view, the most important contribution of the random typing account is to highlight that the problem of equifinality—different processes leading to similar outcomes (Barrett, 2019)—means that the presence of Zipf’s laws are not sufficient to make conclusions about efficiency (Kanwal, 2017; Semple et al., 2022). Multiple lines of evidence should be presented alongside other work demonstrating that efficiency is shaping the system (e.g. physical (Mann et al., 2020) and environmental (Bermúdez‐Cuamatzin et al., 2023) constraints), as I have done here. See Semple et al. (2022), Piantadosi (2014), and Kanwal (2017) for more complete summaries of this debate.
Outside of linguistics, efficiency and complexity are often discussed in relation to cumulative cultural evolution (CCE). Definitions of CCE vary and a full review is outside of the scope of this study, but for convenience we will use the definition of Williams et al. (2022): “the accumulation of sequential changes within a single socially learned behavior that results in improved function”. Discussions of CCE often focus on increasing complexity over time (Wilks & Blakey, 2018), which was once thought to be a hallmark of human culture (Tomasello, 1999) but has now been observed in several non-human communication systems including humpback whale (Garland et al., 2022) and Savannah sparrow song (Williams et al., 2022). Gruber et al. (2022) make a convincing argument that efficiency deserves more attention in CCE, as increases in complexity in one domain require increases in efficiency in another (see Equation (1.1) in the Introduction). House finch song may be a good research model for how the interplay between efficiency and complexity drives CCE, as male house finches have a social learning bias for more complex syllables (Youngblood & Lahti, 2022), possibly as an adaptation to female preferences for more complex songs (Ciaburri & Williams, 2019; Mennill et al., 2006; Nolan & Hill, 2004), and there appears to be pressure for efficiency at the level of both syllables and songs. That being said, CCE may not be the best framework for understanding the interaction between efficiency and complexity in birdsong, as its logic is more difficult to apply to “aesthetic” behavior (Sinclair et al., 2022) especially when it is optimized for female preferences that evolve to maximize inclusive fitness rather than the specific properties of songs that males sing (Geller & Lahti, in press).
This work was inspired in large part by conversations with Olivier Morin, Alexey Koshevoy, and other attendees of the Communicative Efficiency Workshop hosted by the Max Planck Institute for Geoanthropology in April 2023.
The data for this study come from Youngblood & Lahti (2022) (https://github.com/masonyoungblood/TransmissionBias). The analysis code for this study is available in the source R Markdown file in the GitHub repository (https://github.com/masonyoungblood/linguistic_efficiency).
Parameter | Class | Prior | Lower Bound |
|---|---|---|---|
a | b | normal(0, 10) | 1 |
b | b | normal(0, 10) | -1 |
c | b | normal(0, 10) | 0 |
sigma | student_t(3, 0, 2.5) | 0 |
DS | Zipf | Zipf-Mandelbrot |
|---|---|---|
1 | -799 | -1,075 |
2 | -5,328 | -7,817 |
3 | -17,566 | -25,617 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
a | 1.01 | 1.00 | 1.03 | 1 | 5,510 | 3,580 |
c | 0.23 | 0.22 | 0.24 | 1 | 7,223 | 6,440 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
a | 2.20 | 2.03 | 2.39 | 1 | 2,129 | 2,702 |
b | 4.66 | 4.00 | 5.41 | 1 | 2,132 | 2,653 |
c | 9.43 | 5.22 | 16.68 | 1 | 2,102 | 2,515 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
a | 1.00 | 1.00 | 1.00 | 1 | 6,264 | 4,253 |
c | 0.08 | 0.08 | 0.09 | 1 | 7,540 | 5,954 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
a | 1.20 | 1.18 | 1.22 | 1 | 2,132 | 1,965 |
b | 5.69 | 5.42 | 5.98 | 1 | 2,187 | 2,477 |
c | 0.52 | 0.48 | 0.56 | 1 | 2,094 | 2,123 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
a | 1.00 | 1.00 | 1.00 | 1 | 6,266 | 4,117 |
c | 0.03 | 0.03 | 0.03 | 1 | 8,434 | 7,188 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
a | 1.08 | 1.07 | 1.09 | 1 | 2,079 | 2,879 |
b | 17.29 | 16.79 | 17.79 | 1 | 2,076 | 3,021 |
c | 0.34 | 0.32 | 0.36 | 1 | 2,032 | 2,774 |
Class | Prior | Lower Bound |
|---|---|---|
b | normal(0, 0.5) | |
Intercept | normal(0, 4) | 0 |
sd | normal(0, 0.5) | 0 |
sigma | normal(0, 0.5) | 0 |
Outcome | DS | Predictor | Rhat | Bulk ESS | Tail ESS |
|---|---|---|---|---|---|
Duration | 2 | Intercept | 1 | 712 | 913 |
Count | 1 | 794 | 1,064 | ||
3 | Intercept | 1 | 451 | 915 | |
Count | 1 | 566 | 1,069 | ||
4 | Intercept | 1 | 268 | 580 | |
Count | 1 | 302 | 597 | ||
Bandwidth | 2 | Intercept | 1 | 835 | 1,801 |
Count | 1 | 1,047 | 2,088 | ||
3 | Intercept | 1 | 184 | 448 | |
Count | 1 | 230 | 586 | ||
4 | Intercept | 1 | 170 | 406 | |
Count | 1 | 230 | 525 | ||
Concavity | 2 | Intercept | 1 | 1,059 | 2,059 |
Count | 1 | 1,354 | 2,695 | ||
3 | Intercept | 1 | 1,307 | 2,561 | |
Count | 1 | 1,546 | 3,209 | ||
4 | Intercept | 1 | 723 | 1,469 | |
Count | 1 | 775 | 1,620 | ||
Excursion | 2 | Intercept | 1 | 1,301 | 2,456 |
Count | 1 | 1,508 | 2,661 | ||
3 | Intercept | 1 | 735 | 1,398 | |
Count | 1 | 855 | 1,592 | ||
4 | Intercept | 1 | 403 | 771 | |
Count | 1 | 479 | 956 |
Class | Prior | Lower Bound |
|---|---|---|
b | normal(0, 0.1) | |
Intercept | normal(0, 3) | 0 |
sd | normal(0, 0.5) | 0 |
sigma | normal(0, 0.5) | 0 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
Intercept | 4.48 | 4.47 | 4.49 | 1 | 6,740 | 7,103 |
Song Length | -0.05 | -0.07 | -0.04 | 1 | 4,794 | 6,044 |
Note that the effective sample sizes and Rhats from brm_multiple are known to be unreliable. If we instead take a look at the Rhats for each submodel we can see that they are all close to 1, which indicates that the individual models have converged.
Simple Null Model | Production Null Model | |||
|---|---|---|---|---|
Dataset | Intercept | Song Length | Intercept | Song Length |
1 | 0.9998875 | 0.9996983 | 0.9997456 | 1.000656 |
2 | 0.9997542 | 0.9997727 | 0.9999726 | 1.000328 |
3 | 1.0000627 | 0.9997787 | 0.9998281 | 1.001598 |
4 | 0.9997391 | 0.9997015 | 1.0005752 | 1.000895 |
5 | 0.9999400 | 1.0004971 | 0.9997567 | 1.000817 |
6 | 0.9999405 | 0.9998756 | 0.9997009 | 1.000332 |
7 | 0.9999780 | 0.9998101 | 0.9997015 | 1.001533 |
8 | 1.0005872 | 0.9998251 | 0.9998900 | 1.000225 |
9 | 0.9999938 | 0.9997752 | 0.9998704 | 1.001099 |
10 | 0.9997444 | 0.9998224 | 1.0001865 | 1.002680 |
Parameter | Prior | Lower Bound |
|---|---|---|
a | normal(0, 1) | 0 |
b | normal(0, 1) | 0 |
c | normal(0, 1) | 0 |
d | normal(0, 1) | 0 |
2 | 3 | 4 | |||||||
|---|---|---|---|---|---|---|---|---|---|
Exp | PL | Comp | Exp | PL | Comp | Exp | PL | Comp | |
100 | -829 | -765 | -841 | -617 | -486 | -668 | -598 | -432 | -688 |
200 | -1,557 | -1,513 | -1,560 | -1,232 | -1,068 | -1,281 | -1,201 | -967 | -1,267 |
300 | -2,101 | -2,086 | -2,114 | -1,635 | -1,555 | -1,695 | -1,626 | -1,468 | -1,686 |
400 | -2,683 | -2,676 | -2,712 | -2,008 | -1,977 | -2,131 | -2,082 | -1,950 | -2,142 |
500 | -2,984 | -2,988 | -3,073 | -2,078 | -2,093 | -2,276 | -2,260 | -2,217 | -2,387 |
600 | -3,114 | -3,176 | -3,248 | -2,264 | -2,299 | -2,527 | -2,454 | -2,449 | -2,639 |
700 | -3,291 | -3,430 | -3,471 | -2,493 | -2,539 | -2,808 | -2,650 | -2,669 | -2,901 |
800 | -3,620 | -3,734 | -3,766 | -2,479 | -2,598 | -2,798 | -2,209 | -2,240 | -2,485 |
900 | -3,615 | -3,785 | -3,801 | -2,108 | -2,379 | -2,467 | -1,568 | -1,804 | -1,886 |
1000 | -3,712 | -3,931 | -3,942 | -1,958 | -2,350 | -2,408 | -1,235 | -1,608 | -1,654 |
1200 | -3,175 | -3,244 | -3,246 | -1,483 | -1,711 | -1,731 | -823 | -1,084 | -1,104 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
a | 0.17 | 0.16 | 0.19 | 1 | 4,336 | 4,595 |
b | 0.39 | 0.35 | 0.44 | 1 | 4,511 | 4,996 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
c | 0.13 | 0.12 | 0.14 | 1 | 6,738 | 7,059 |
d | 1.08 | 1.01 | 1.15 | 1 | 6,437 | 6,285 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
a | 0.16 | 0.12 | 0.18 | 1 | 3,354 | 3,685 |
b | 0.43 | 0.38 | 0.48 | 1 | 4,592 | 4,387 |
c | 0.02 | 0.00 | 0.04 | 1 | 3,021 | 2,579 |
d | 0.66 | 0.29 | 0.96 | 1 | 3,237 | 2,798 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
a | 0.60 | 0.57 | 0.64 | 1 | 4,478 | 5,210 |
b | 0.27 | 0.26 | 0.29 | 1 | 4,453 | 5,343 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
c | 0.52 | 0.49 | 0.56 | 1 | 6,895 | 5,998 |
d | 0.95 | 0.90 | 1.00 | 1 | 7,133 | 7,055 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
a | 0.53 | 0.47 | 0.59 | 1 | 2,820 | 3,907 |
b | 0.30 | 0.28 | 0.32 | 1 | 4,896 | 4,646 |
c | 0.08 | 0.04 | 0.13 | 1 | 2,882 | 2,864 |
d | 0.61 | 0.42 | 0.77 | 1 | 2,978 | 3,096 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
a | 0.74 | 0.71 | 0.77 | 1 | 3,787 | 4,572 |
b | 0.24 | 0.23 | 0.25 | 1 | 4,287 | 5,362 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
c | 0.69 | 0.64 | 0.73 | 1 | 6,835 | 6,785 |
d | 0.92 | 0.87 | 0.97 | 1 | 6,638 | 6,848 |
Param. | Estimate | l-95% CI | u-95% CI | Rhat | Bulk_ESS | Tail_ESS |
|---|---|---|---|---|---|---|
a | 0.65 | 0.59 | 0.71 | 1 | 2,387 | 2,851 |
b | 0.26 | 0.25 | 0.27 | 1 | 4,566 | 4,959 |
c | 0.10 | 0.05 | 0.15 | 1 | 2,433 | 2,760 |
d | 0.61 | 0.45 | 0.74 | 1 | 2,360 | 2,726 |